Search CORE

369 research outputs found

ElliPro: a new structure-based tool for the prediction of antibody epitopes

Author: Alessandro Sette
B Peters
Bjoern Peters
D Schneidman-Duhovny
E Westhof
H Neuvirth
HM Berman
Huynh-Hoa Bui
J Novotny
JA Greenbaum
JG Mandell
JM Thornton
Julia Ponomarenko
JV Ponomarenko
MHV Van Regenmortel
MJ Gomara
MS Bijker
Nicholas Fusseder
P Haste Andersen
Philip E Bourne
SF Altschul
T Fawcett
U Kulkarni-Kale
WD Bradford Jr
Wei Li
WG Laver
WR Taylor
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highly desirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity, solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton and colleagues proposed a method for identifying continuous epitopes in the protein regions protruding from the protein's globular surface. The aim of this work was to implement that method as a web-tool and evaluate its performance on discontinuous epitopes known from the structures of antibody-protein complexes. Results Here we present ElliPro, a web-tool that implements Thornton's method and, together with a residue clustering algorithm, the MODELLER program and the Jmol viewer, allows the prediction and visualization of antibody epitopes in a given protein sequence or structure. ElliPro has been tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures of antibody-protein complexes. In comparison with six other structure-based methods that can be used for epitope prediction, ElliPro performed the best and gave an AUC value of 0.732, when the most significant prediction was considered for each protein. Since the rank of the best prediction was at most in the top three for more than 70% of proteins and never exceeded five, ElliPro is considered a useful research tool for identifying antibody epitopes in protein antigens. ElliPro is available at <url>http://tools.immuneepitope.org/tools/ElliPro</url>. Conclusion The results from ElliPro suggest that further research on antibody epitopes considering more features that discriminate epitopes from non-epitopes may further improve predictions. As ElliPro is based on the geometrical properties of protein structure and does not require training, it might be more generally applied for predicting different types of protein-protein interactions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature

Author: AS Kolaskar
B Rost
DR Flower
EA Emini
G Riddick
G Walter
HR Ansari
Hua Zou
J Chen
J Huang
J Larsen
J Mintseris
J Pellequer
J Pellequer
J Ponomarenko
J Sollner
J Sun
J Wu
JM Parker
Juan Liu
JV Ponomarenko
L Breiman
M Sikić
Mark Hall
Meng Zhao
MH Van Regenmortel
MH Van Regenmortel
MJ Blythe
MJ Sweredoski
MJ Sweredoski
ND Rubinstein
ND Rubinstein
ND Rubinstein
P Jain
PA Karplus
PH Andersen
R Liu
S Liang
S Liang
S Saha
SR Comeau
W Kabsch
Wen Zhang
Xinghuo Ye
Y El-Manzalawy
Yi Xiong
ZP Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Antigen-antibody interactions are key events in immune system, which provide important clues to the immune processes and responses. In Antigen-antibody interactions, the specific sites on the antigens that are directly bound by the B-cell produced antibodies are well known as B-cell epitopes. The identification of epitopes is a hot topic in bioinformatics because of their potential use in the epitope-based drug design. Although most B-cell epitopes are discontinuous (or conformational), insufficient effort has been put into the conformational epitope prediction, and the performance of existing methods is far from satisfaction. Results In order to develop the high-accuracy model, we focus on some possible aspects concerning the prediction performance, including the impact of interior residues, different contributions of adjacent residues, and the imbalanced data which contain much more non-epitope residues than epitope residues. In order to address above issues, we take following strategies. Firstly, a concept of 'thick surface patch' instead of 'surface patch' is introduced to describe the local spatial context of each surface residue, which considers the impact of interior residue. The comparison between the thick surface patch and the surface patch shows that interior residues contribute to the recognition of epitopes. Secondly, statistical significance of the distance distribution difference between non-epitope patches and epitope patches is observed, thus an adjacent residue distance feature is presented, which reflects the unequal contributions of adjacent residues to the location of binding sites. Thirdly, a bootstrapping and voting procedure is adopted to deal with the imbalanced dataset. Based on the above ideas, we propose a new method to identify the B-cell conformational epitopes from 3D structures by combining conventional features and the proposed feature, and the random forest (RF) algorithm is used as the classification engine. The experiments show that our method can predict conformational B-cell epitopes with high accuracy. Evaluated by leave-one-out cross validation (LOOCV), our method achieves the mean AUC value of 0.633 for the benchmark bound dataset, and the mean AUC value of 0.654 for the benchmark unbound dataset. When compared with the state-of-the-art prediction models in the independent test, our method demonstrates comparable or better performance. Conclusions Our method is demonstrated to be effective for the prediction of conformational epitopes. Based on the study, we develop a tool to predict the conformational epitopes from 3D structures, available at <url>http://code.google.com/p/my-project-bpredictor/downloads/list</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

G+C content dominates intrinsic nucleosome occupancy

Author: A Barbic
A Groth
A Thastrom
A Valouev
AV Sivolob
B Efron
B Li
B Suter
BE Bernstein
CK Lee
CR Calladine
Desiree Tillo
E Segal
E Segal
EA Sekinger
F Ozsolak
GC Yuan
GC Yuan
H Cao
HE Peckham
HR Drew
I Brukner
I Ioshikhes
IP Ioshikhes
JC Dohm
JP Thiery
JV Ponomarenko
K Luger
M Gardiner-Garden
MY Tolstorukov
MY Tolstorukov
N Kaplan
PA Rice
R Tibshirani
S Aerts
S Schwartz
SC Satchwell
Timothy R Hughes
V Miele
W Lee
Y Field
YH Wang
YH Wang
Publication venue: BioMed Central
Publication date: 01/12/2009
Field of study

Abstract Background The relative preference of nucleosomes to form on individual DNA sequences plays a major role in genome packaging. A wide variety of DNA sequence features are believed to influence nucleosome formation, including periodic dinucleotide signals, poly-A stretches and other short motifs, and sequence properties that influence DNA structure, including base content. It was recently shown by Kaplan et al. that a probabilistic model using composition of all 5-mers within a nucleosome-sized tiling window accurately predicts intrinsic nucleosome occupancy across an entire genome <it>in vitro</it>. However, the model is complicated, and it is not clear which specific DNA sequence properties are most important for intrinsic nucleosome-forming preferences. Results We find that a simple linear combination of only 14 simple DNA sequence attributes (G+C content, two transformations of dinucleotide composition, and the frequency of eleven 4-bp sequences) explains nucleosome occupancy <it>in vitro </it>and <it>in vivo </it>in a manner comparable to the Kaplan model. G+C content and frequency of AAAA are the most important features. G+C content is dominant, alone explaining ~50% of the variation in nucleosome occupancy <it>in vitro</it>. Conclusions Our findings provide a dramatically simplified means to predict and understand intrinsic nucleosome occupancy. G+C content may dominate because it both reduces frequency of poly-A-like stretches and correlates with many other DNA structural characteristics. Since G+C content is enriched or depleted at many types of features in diverse eukaryotic genomes, our results suggest that variation in nucleotide composition may have a widespread and direct influence on chromatin structure.</p

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge

Author: A Valouev
Alexandra M Carvalho
AM Carvalho
AP Fejes
Arlindo L Oliveira
C Deremble
C Lee
CT Harbison
D Ucar
E Segal
E Valen
F Daenen
G Paillard
G Paillard
G Pavesi
GC Yuan
I Lafontaine
I Lafontaine
I Lafontaine
IV Kulakovskiy
JV Ponomarenko
KD MacIsaac
L Marsan
L Narlikar
L Narlikar
M Hu
M Kellis
MF Sagot
N Pisanti
R Gordân
R Gordân
R Gordân
R Pudimat
R Siddharthan
RA O'Flanagan
RG Beiko
S Sinha
T Wang
TL Bailey
TL Bailey
V Matys
WW Wasserman
X Chen
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied. Results We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote. Conclusions The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Automated functional classification of experimental and predicted protein structures

Author: A Andreeva
A Godzik
A Stark
AG Murzin
AR Ortiz
B Zhang
D Fischer
D Pal
D Xu
EC Webb
EF Pettersen
F Pazos
GJ Bartlett
H Hegyi
HM Berman
IN Shindyalov
J Gough
JA Di Gennaro
JC Whisstock
JD Thompson
JD Watson
JM Bujnicki
JM Bujnicki
JM Chandonia
JS Fetrow
JV Ponomarenko
K Ginalski
K Ginalski
K Ginalski
K Pawlowski
K Wang
Kai Wang
L Liao
L Rychlewski
L Rychlewski
L Xie
LH Hung
LH Hung
M Ashburner
MJ Ondrechen
N Nagano
N Nagano
R Kuang
Ram Samudrala
S Cheek
SE Brenner
SF Altschul
SK Burley
SR Eddy
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Proteins that are similar in sequence or structure may perform different functions in nature. In such cases, function cannot be inferred from sequence or structural similarity. RESULTS: We analyzed experimental structures belonging to the Structural Classification of Proteins (SCOP) database and showed that about half of them belong to multi-functional fold families for which protein similarity alone is not adequate to assign function. We also analyzed predicted structures from the LiveBench and the PDB-CAFASP experiments and showed that accurate homology-based functional assignments cannot be achieved approximately one third of the time, when the protein is a member of a multi-functional fold family. We then conducted extended performance evaluation and comparisons on both experimental and predicted structures using our Functional Signatures from Structural Alignments (FSSA) algorithm that we previously developed to handle the problem of classifying proteins belonging to multi-functional fold families. CONCLUSION: The results indicate that the FSSA algorithm has better accuracy when compared to homology-based approaches for functional classification of both experimental and predicted protein structures, in part due to its use of local, as opposed to global, information for classifying function. The FSSA algorithm has also been implemented as a webserver and is available at

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Comparative immunogenicity and structural analysis of epitopes of different bacterial L-asparaginases

Author: A Abuchowski
A Khan
A Leal-Egana
AI Goldberg
AL Swain
B Zalewska-Szewczyk
BL Asselin
D Cappelletti
EF Pettersen
F Sievers
G Qian
H Lazarus
Ilya N. Dyakov
J Jean-Francois
J Krasotkina
J Pei
J Ponomarenko
JC Jorge
JV Kringelum
K Spiess
LM Vrooman
Marat D. Kazanov
Marina V. Pokrovskaya
MM Gaspar
MM Gasper
MV Pokrovskaia
MV Pokrovskaya
ND Rubinstein
RP Warrell Jr
S Liang
Svetlana S. Aleksandrova
Vadim S. Pokrovsky
W Kabsch
Y Zhang
YM Kwon
YQ Zhang
ZB Moola
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A systematic, large-scale comparison of transcription factor binding site models

Background The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified “real” in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. Results While the area- under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Conclusions Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites

Institutional Repository of the Freie Universität Berlin

Crossref

Springer - Publisher Connector

PubMed Central

Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions

Author: A Hoglund
AE Kel
AE Kel
AE Vinogradov
B Efron
B Jaruga
BJ Deroo
C Burge
CD Schmid
CR Calladine
D Cai
D GuhaThakurta
DM Graunke
E Fayard
Elena A Ananko
Elena V Ignatieva
FA Wright
GD Stormo
HP Ko
I Abnizova
I Ben-Gal
IA Udalova
Igor I Turnaev
J Duarte
J Hu
JV Ponomarenko
K Ellrott
K Morohashi
K Quandt
KJ Campbell
L Quintana-Murci
LC Platanias
LG Cowell
M Beato
M Blanchette
M Costantini
M Ganapathi
M Lohoff
M Stepanova
M-LT Lee
ML Bulyk
MP Ponomarenko
MQ Zhang
MQ Zhang
NA Kolchanov
NI Gershenzon
Nikolay A Kolchanov
NV Klimova
O Kel-Margoulis
OA Podkolodnaia
OD King
OG Berg
P Val
PV Benos
Q Zhou
R Castelo
R Kiyama
R Osada
R Pudimat
RV Davuluri
S Kamalakaran
Tatyana I Merkulova
TC Hodgman
TK Man
TM Chen
TV Busygina
VG Levitskii
VG Levitsky
VG Levitsky
VG Levitsky
VG Levitsky
Victor G Levitsky
VV Solovyev
W Huang
WH Shen
WW Wasserman
X Xie
Y Barash
Publication venue: BioMed Central
Publication date: 01/12/2007
Field of study

Abstract Background Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered. Results To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies. To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA. Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies. Conclusion Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Prediction of Peptide Reactivity with Human IVIg through a Knowledge-Based Approach

Author: A Tiengo
Alessandra Tiengo
AS Kolaskar
B Korber
C Lundegaard
CA Janeway Jr
D Castelletti
E Frank
EA Emini
GH John
H Akaike
H Neuvirth
I Dimitrov
I Kufareva
J Garnier
J Janin
J Novotny
J Sollner
J Sollner
JA Greenbaum
JE Larsen
JL Pellequer
JM Parker
JR Quinlan
JV Ponomarenko
KW Jørgensen
L Liu
M Di Brino
M Landau
M Levitt
Mark Isalan
MJ Blythe
N Rapin
Nicola Barbarini
O Carugo
P Haste Andersen
P Lorenz
PA Karplus
PY Chou
R Bellazzi
R Bhaskaran
R Chen
R Hua
R Quinlan
R Tibshirani
R Vita
Riccardo Bellazzi
S Kaczanowski
S Saha
SB Needleman
T Schwede
TD Schneider
TF Smith
TP Hopp
U Kulkarni-Kale
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The prediction of antibody-protein (antigen) interactions is very difficult due to the huge variability that characterizes the structure of the antibodies. The region of the antigen bound to the antibodies is called epitope. Experimental data indicate that many antibodies react with a panel of distinct epitopes (positive reaction). The Challenge 1 of DREAM5 aims at understanding whether there exists rules for predicting the reactivity of a peptide/epitope, i.e., its capability to bind to human antibodies. DREAM 5 provided a training set of peptides with experimentally identified high and low reactivities to human antibodies. On the basis of this training set, the participants to the challenge were asked to develop a predictive model of reactivity. A test set was then provided to evaluate the performance of the model implemented so far

Public Library of Science (PLOS)

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Directory of Open Access Journals

PubMed Central